Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

نویسندگان

  • Kyle Hundman
  • Chris Mattmann
چکیده

We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random €elds (CRF) to identify measurement values and units, followed by a rule-based system to €nd related entities, descriptors and modi€ers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency paŠerns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve’s ability to generate high-precision extractions with strong recall. We also discuss Marve’s role in re€ning measurement requirements for NASA’s proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world’s ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. Œese extractions accelerate broad, cross-cuŠing research and expose scientists new algorithmic approaches and experimental nuances. Œey also facilitate identi€cation of scienti€c opportunities enabled by HyspIRI leading to more ecient scienti€c investment and research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A deconstructive critique of a mystical anecdote from the book Ronaq al-Majalis [The Prosperity of Meetings]

     Deconstruction was first introduced in the thought of Jacques Derrida as a way of re-reading texts and questioning its presuppositions. This type of critique seeks to find new meanings by finding binary oppositions in the text and disrupting the superiority and domination of one side over the other, and on the other hand, by discovering gaps and discontinuities that have arisen in the text...

متن کامل

Discovering the Underlying Components Affecting the Usability of IoT in Iranian Libraries: A Theory Based on Context

Objective: The aim is to discover the underlying context components of IOT usability in Iranian libraries: A qualitative approach consistent with grounded theory. Method: This qualitative study was conducted based on grounded theory. Data were collected through semi-structured interviews with 13 faculty members of knowledge and information science based on purposeful and chain methods. Responsi...

متن کامل

Learning Context for Text Categorization

This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from spor...

متن کامل

Reconstruction of Data Gaps in Total-Ozone Records with a New Wavelet Technique

This study introduces a new technique to fill and reconstruct daily observational of Total Ozone records containing void data for some days based on the wavelet theory as a linear time-frequency transformation, which has been considered in various fields of science, especially in the earth and space physics and observational data processing related to the Earth and space sciences. The initial c...

متن کامل

Structural Linguistics and Unsupervised Information Extraction

A precondition for extracting information from large text corpora is discovering the information structures underlying the text. Progress in this direction is being made in the form of unsupervised information extraction (IE). We describe recent work in unsupervised relation extraction and compare its goals to those of grammar discovery for science sublanguages. We consider what this work on gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.04312  شماره 

صفحات  -

تاریخ انتشار 2017